Resources

https://www.bls.gov/tus/ehmintcodebk1416.pdf

https://www.kaggle.com/bls/eating-health-module-dataset

https://www.bls.gov/tus/ehdatafiles.htm

https://www.ers.usda.gov/publications/pub-details/?pubid=42817

Dataset

Data from the U.S. Dept of Agriculture survey containing information about eating habits and health for respondents in 2014, including variables about

  1. primary eating

  2. secondary eating (eating while performing an activity)

  3. grocery shopping

  4. meal preparation

  5. food assistance participation

  6. general health, height, and weight

  7. household income


Exploratory Analysis

Data Availability

There are a total of 11212 samples in the dataset. Here is the data availability for each variable.


Correlation Matrix

Here is a plot of the correlation matrix for the variables.


Positive Correlation

Below is a list of the top 20 positively correlated variables.


Negative Correlation

Below is a list of the top 20 negatively correlated variables.


One-Dimensional Analysis

BMI

Sample Count: 10637

Calculation: weight (kg) / [height (m)]2

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   13.00   23.60   26.60   27.77   30.70   73.60

Weight

Sample Count: 10712

Self-Reported Response: How much do you weight without shoes? (in pounds)

Note: EUWGT is bottomcoded to 98 lbs and topcoded to 340 lbs.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    98.0   145.0   170.0   176.2   200.0   340.0

Height

Sample Count: 11051

Self-Reported Response: How tall are you without shoes? (in inches)

Note: EUHGT is bottomcoded to 56 inches and topcoded to 77 inches

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   56.00   64.00   66.00   66.62   70.00   77.00

Health

Sample Count: 11128

Self-Reported Response: In general, would you say that your physical health was excellent, very good, good, fair, or poor?


Exercise

Self-Reported Response: During the past 7 days, did you participate in any physical activities or exercises for fitness and health such as running, bicycling, working out in a gym, walking for exercise, or playing sports? (Sample Count: 11155)

Self-Reported Response: How many times over the past 7 days did you take part in these activities? (Sample Count: 6993)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   3.000   4.000   4.193   5.000  38.000

Enough Food

Self-Reported Response: Which of the following statements best describes the amount of food eaten in your household in the last 30 days - enough food to eat, sometimes not enough to eat, or often not enough to eat? (Sample Count: 11161)


Fast Food

Self-Reported Response: Thinking back over the last 7 days, did you purchase any: prepared food from a deli, carry-out, delivery food, or fast food? (Sample Count: 11169)

Self-Reported Response: How many times in the last 7 days did you purchase: prepared food from a deli, carry-out, delivery food, or fast food? (Sample Count: 6440)


Grocer

Self-Reported Response: Where do you get the majority of your groceries? (Sample Count: 8208)

Self-Reported Response: What is the primary reason you shop there? (Sample Count: 8131)


Drink

Self-Reported Response: Not including plain water, were there any other times yesterday when you were drinking any beverages? (Sample Count: 11202)

Self-Reported Response: Were any of the beverages soft drinks such as cola, root beer, or gingerale? (Sample Count: 7513)

Self-Reported Response: Was the soft drink diet, regular or did you have both kinds? (Sample Count: 3037)


Primary Eating

Sample Count: 10725

Self-Reported Response: Total amount of time spent in primary eating and drinking (in minutes)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00   30.00   60.00   68.66   90.00  508.00

Secondary Eating

Sample Count: 6061

Self-Reported Response: Total amount of time spent in secondary eating and drinking (in minutes)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00   10.00   15.00   31.02   30.00  990.00

Income

Self-Reported Response: Last month, was your total household income before taxes more or less than (amount) per month? (Sample Count: 10932)


SNAP

Self-Reported Response: In the past 30 days, did you or any member of this household receive SNAP or food stamp benefits? (Sample Count: 11135)


WIC

Self-Reported Response: In the last 30 days, did you or any member of your household receive benefits from the WIC program, that is, the Women, Infants, and Children program? (Sample Count: 5805)


Employment

Self-Reported Response: Change in spouse or unmarried partner’s labor force status or full time or part time employment status between CPS and ATUS (Sample Count: 5677)


BMI versus…

Weight

## `geom_smooth()` using method = 'gam'


Height

## `geom_smooth()` using method = 'gam'


Health


Exercise

The indepedent samples t-test below shows with high confidence that people who exercise have, on average, a lower BMI than people who don’t exercise.

## 
##  Welch Two Sample t-test
## 
## data:  x and y
## t = -12.917, df = 7218.6, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.904665 -1.402735
## sample estimates:
## mean of x mean of y 
##  27.15812  28.81182
## `geom_smooth()` using method = 'gam'


Enough Food


Fast Food

## `geom_smooth()` using method = 'gam'


Grocer


Drink


Primary Eating

## `geom_smooth()` using method = 'gam'


Secondary Eating

## `geom_smooth()` using method = 'gam'


Income


SNAP

The indepedent samples t-test below shows with high confidence that the sample mean for BMI in the SNAP program is higher than the sample mean for people not in the program.

## 
##  Welch Two Sample t-test
## 
## data:  x and y
## t = 8.3221, df = 1261.2, p-value = 2.221e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.469371 2.375842
## sample estimates:
## mean of x mean of y 
##  29.49899  27.57639

WIC

The indepedent samples t-test below shows with high confidence that the sample mean for BMI in the WIC program is higher than the sample mean for people not in the program.

## 
##  Welch Two Sample t-test
## 
## data:  x and y
## t = 4.57, df = 397.73, p-value = 6.519e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.9762109 2.4502044
## sample estimates:
## mean of x mean of y 
##  29.16994  27.45674

Employment


Multiple Linear Regression

Predicting BMI

Linear Models Predicting BMI
Dependent variable:
BMI
(1) (2) (3) (4) (5) (6) (7)
Exercise_Freq -0.280*** -0.271*** -0.137*** -0.278*** -0.140*** -0.139***
(0.025) (0.025) (0.023) (0.025) (0.023) (0.023)
SNAP -1.856*** -0.677**
(0.235) (0.235)
Excellent_Health -5.767*** -5.443***
(0.397) (0.401)
Very_Good_Health -4.052*** -3.843***
(0.395) (0.397)
Good_Health -1.580*** -1.458***
(0.402) (0.403)
Fair_Health 0.108 0.167
(0.438) (0.438)
Fast_Food_Freq 0.147***
(0.029)
Health 1.748*** 1.708***
(0.061) (0.061)
Constant 28.538*** 32.035*** 30.716*** 30.903*** 28.294*** 23.799*** 25.183***
(0.093) (0.459) (0.385) (0.387) (0.105) (0.167) (0.503)
Observations 10,221 10,221 10,221 10,221 10,221 10,221 10,221
R2 0.018 0.026 0.104 0.108 0.020 0.103 0.104
Adjusted R2 0.018 0.026 0.104 0.108 0.020 0.103 0.104
Residual Std. Error 6.119 (df = 10219) 6.093 (df = 10218) 5.844 (df = 10216) 5.831 (df = 10215) 6.111 (df = 10218) 5.847 (df = 10218) 5.844 (df = 10217)
Note: p<0.05; p<0.01; p<0.001

AIC (1) = 6.60410^{4}

AIC (2) = 6.59510^{4}

AIC (3) = 6.5110^{4}

AIC (4) = 6.50610^{4}

AIC (5) = 6.60110^{4}

AIC (6) = 6.51110^{4}

AIC (7) = 6.5110^{4}


Predicting Health

Linear Models Predicting Health
Dependent variable:
Health
(1) (2) (3) (4)
Exercise_Freq -0.081*** -0.067***
(0.005) (0.004)
BMI 0.054*** 0.050***
(0.002) (0.002)
Fast_Food_Freq -0.022*** -0.032***
(0.005) (0.004)
Constant 2.711*** 0.996*** 2.533*** 1.327***
(0.016) (0.048) (0.014) (0.051)
Observations 10,221 10,221 10,221 10,221
R2 0.050 0.099 0.002 0.137
Adjusted R2 0.050 0.099 0.002 0.137
Residual Std. Error 1.033 (df = 10219) 1.006 (df = 10219) 1.058 (df = 10219) 0.984 (df = 10217)
Note: p<0.05; p<0.01; p<0.001

AIC (1) = 2.96710^{4}

AIC (2) = 2.91210^{4}

AIC (3) = 3.01710^{4}

AIC (4) = 2.86910^{4}


K-Means Clustering